An experiment in authorship attribution

نویسندگان

  • Harald Baayen
  • Hans van Halteren
  • Anneke Neijt
  • Fiona Tweedie
چکیده

This paper reports an experiment in authorship attribution that reveals considerable authorial structure in texts written by authors with very similar background and training, with genre and topic being strictly controlled for. We interpret our results as supporting the hypothesis that authors have ’textual fingerprints’, at least for texts produced by authors who are not consciously changing their style of writing across texts. What this study has also taught us is that discriminant analysis is a more appropriate technique to use than principal components analysis when predicting the authorship of an unknown (held-out) text on the basis of known (training) texts of which the authorial provenance is available. Finally, standard discriminant analysis can be enhanced considerably by using an entropy-based weighting scheme of the kind used in latent semantic analysis (Landauer et al., 1998).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Authorship Attribution Using Text Distortion

Authorship attribution is associated with important applications in forensics and humanities research. A crucial point in this field is to quantify the personal style of writing, ideally in a way that is not affected by changes in topic or genre. In this paper, we present a novel method that enhances authorship attribution effectiveness by introducing a text distortion step before extracting st...

متن کامل

Domain Independent Authorship Attribution without Domain Adaptation

Automatic authorship attribution, by its nature, is much more advantageous if it is domain (i.e., topic and/or genre) independent. That is, many real world problems that require authorship attribution may not have in-domain training data readily available. However, most previous work based on machine learning techniques focused only on in-domain text for authorship attribution. In this paper, w...

متن کامل

An Extremely Simple Authorship Attribution System

In this paper we present a very simple yet effective algorithm for authorship attribution. By this term we mean the act of telling whether a certain text was or was not written by a certain author. We shall not discuss the advantages or applications of this activity, but we propose a method for doing it in an automatic and instantaneous way, neither considering the language of the texts nor und...

متن کامل

Questioned Electronic Documents : Empirical Studies in Authorship Attribution

Forensic analysis of questioned electronic documents is very difficult, because the nature of the documents eliminates many kinds of informative differences. Recent work in authorship attribution demonstrates the practicality of analyzing documents based on authorial style, but the state of the art is confusing. Analyses are difficult to apply, little is known about type or rate of errors, and ...

متن کامل

N-gram-based Author Profiles for Authorship Attribution

We present a novel method for computer-assisted authorship attribution based on characterlevel n-gram author profiles, which is motivated by an almost-forgotten, pioneering method in 1976. The existing approaches to automated authorship attribution implicitly build author profiles as vectors of feature weights, as language models, or similar. Our approach is based on byte-level n-grams, it is l...

متن کامل

Authorship Attribution Using Word Network Features

In this paper, we explore a set of novel features for authorship attribution of documents. These features are derived from a word network representation of natural language text. As has been noted in previous studies, natural language tends to show complex network structure at word level, with low degrees of separation and scale-free (power law) degree distribution. There has also been work on ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002